Implement migration sequencing (phase 1) #2980

ericvergnaud · 2024-10-16T10:56:10Z

Changes

Users need the ability to sequence migration tasks
This PR implements sequencing for workflow tasks, jobs and clusters.
It will be followed by PRs for other workspace items

Linked issues

Progresses #1415

Functionality

None

Tests

added unit tests

github-actions · 2024-10-16T11:26:37Z

❌ 197/202 passed, 1 flaky, 5 failed, 11 skipped, 4h38m11s total

❌ test_create_catalog_schema_with_legacy_hive_metastore_privileges: TypeError: CatalogSchema.create_all_catalogs_schemas() got an unexpected keyword argument 'properties' (13.323s)

TypeError: CatalogSchema.create_all_catalogs_schemas() got an unexpected keyword argument 'properties'
[gw5] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
[gw5] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

❌ test_compare_remote_local_install_versions: Failed: DID NOT RAISE (38.416s)

Failed: DID NOT RAISE <class 'RuntimeWarning'>
[gw8] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
16:26 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.aqc1/config.yml) doesn't exist.
16:26 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
16:26 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
16:26 INFO [databricks.labs.ucx.install] Fetching installations...
16:26 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
16:26 DEBUG [tests.integration.conftest] Waiting for clusters to start...
16:26 DEBUG [tests.integration.conftest] Waiting for clusters to start...
16:26 INFO [databricks.labs.ucx.install] Installing UCX v0.45.1+3120241017162616
16:26 INFO [databricks.labs.ucx.install] Creating ucx schemas...
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=experimental-workflow-linter
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
16:26 INFO [databricks.labs.ucx.install] Creating dashboards...
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
16:26 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
16:26 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
16:26 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.aqc1/README for the next steps.
16:26 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.aqc1/config.yml) doesn't exist.
16:26 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
16:26 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
16:26 INFO [databricks.labs.ucx.install] Fetching installations...
16:26 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
16:26 DEBUG [tests.integration.conftest] Waiting for clusters to start...
16:26 DEBUG [tests.integration.conftest] Waiting for clusters to start...
16:26 INFO [databricks.labs.ucx.install] Installing UCX v0.45.1+3120241017162616
16:26 INFO [databricks.labs.ucx.install] Creating ucx schemas...
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=experimental-workflow-linter
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
16:26 INFO [databricks.labs.ucx.install] Creating dashboards...
16:26 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
16:26 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
16:26 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
16:26 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
16:26 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
16:26 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.aqc1/README for the next steps.
16:26 INFO [databricks.labs.ucx.install] Deleting UCX v0.45.1+3120241017162616 from https://DATABRICKS_HOST
16:26 INFO [databricks.labs.ucx.install] Deleting inventory database dummy_s14qs
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=445289596053554, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=916319574573885, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=938797551991197, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=548542372208228, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=148048249464210, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=577411837766842, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=408879298192124, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=7138763586467, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=133599915887279, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1012112652960536, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=359498217761337, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=693232750147055, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=278642487109511, as it is no longer needed
16:26 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1028762862695397, as it is no longer needed
16:26 INFO [databricks.labs.ucx.install] Deleting cluster policy
16:26 INFO [databricks.labs.ucx.install] Deleting secret scope
16:26 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
[gw8] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

❌ test_create_all_catalogs_schemas: TypeError: CatalogSchema.create_all_catalogs_schemas() got an unexpected keyword argument 'properties' (9.782s)

TypeError: CatalogSchema.create_all_catalogs_schemas() got an unexpected keyword argument 'properties'
[gw1] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
[gw1] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

❌ test_instance_pools[False]: databricks.sdk.errors.platform.InvalidParameterValue: Node type Standard_D4pds_v6 is not supported. Supported node types: Standard_DS3_v2, Standard_DS4_v2, Standard_DS5_v2, Standard_D4s_v3, Standard_D8s_v3, Standard_D16s_v3, Standard_D32s_v3, Standard_D64s_v3, Standard_D4a_v4, Standard_D8a_v4, Standard_D16a_v4, Standard_D32a_v4, Standard_D48a_v4, Standard_D64a_v4, Standard_D96a_v4, Standard_D8as_v4, Standard_D16as_v4, Standard_D32as_v4, Standard_D48as_v4, Standard_D64as_v4, Standard_D96as_v4, Standard_D4ds_v4, Standard_D8ds_v4, Standard_D16ds_v4, Standard_D32ds_v4, Standard_D48ds_v4, Standard_D64ds_v4, Standard_D3_v2, Standard_D4_v2, Standard_D5_v2, Standard_D8_v3, Standard_D16_v3, Standard_D32_v3, Standard_D64_v3, Standard_D4as_v5, Standard_D8as_v5, Standard_D16as_v5, Standard_D32as_v5, Standard_D48as_v5, Standard_D64as_v5, Standard_D96as_v5, Standard_D4ads_v5, Standard_D8ads_v5, Standard_D16ads_v5, Standard_D32ads_v5, Standard_D48ads_v5, Standard_D64ads_v5, Standard_D96ads_v5, Standard_D4d_v4, Standard_D8d_v4, Standard_D16d_v4, Standard_D32d_v4, Standard_D48d_v4, Standard_D64d_v4, Standard_D12_v2, Standard_D13_v2, Standard_D14_v2, Standard_D15_v2, Standard_DS12_v2, Standard_DS13_v2, Standard_DS14_v2, Standard_DS15_v2, Standard_E8_v3, Standard_E16_v3, Standard_E32_v3, Standard_E64_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E64s_v3, Standard_E4d_v4, Standard_E8d_v4, Standard_E16d_v4, Standard_E20d_v4, Standard_E32d_v4, Standard_E48d_v4, Standard_E64d_v4, Standard_E4ds_v4, Standard_E8ds_v4, Standard_E16ds_v4, Standard_E20ds_v4, Standard_E32ds_v4, Standard_E48ds_v4, Standard_E64ds_v4, Standard_E80ids_v4, Standard_E4a_v4, Standard_E8a_v4, Standard_E16a_v4, Standard_E20a_v4, Standard_E32a_v4, Standard_E48a_v4, Standard_E64a_v4, Standard_E96a_v4, Standard_E4as_v4, Standard_E8as_v4, Standard_E16as_v4, Standard_E20as_v4, Standard_E32as_v4, Standard_E48as_v4, Standard_E64as_v4, Standard_E96as_v4, Standard_E4s_v4, Standard_E8s_v4, Standard_E16s_v4, Standard_E20s_v4, Standard_E32s_v4, Standard_E48s_v4, Standard_E64s_v4, Standard_E80is_v4, Standard_E4as_v5, Standard_E8as_v5, Standard_E16as_v5, Standard_E20as_v5, Standard_E32as_v5, Standard_E48as_v5, Standard_E64as_v5, Standard_E96as_v5, Standard_E4ads_v5, Standard_E8ads_v5, Standard_E16ads_v5, Standard_E20ads_v5, Standard_E32ads_v5, Standard_E48ads_v5, Standard_E64ads_v5, Standard_E96ads_v5, Standard_L4s, Standard_L8s, Standard_L16s, Standard_L32s, Standard_F4, Standard_F8, Standard_F16, Standard_F4s, Standard_F8s, Standard_F16s, Standard_H8, Standard_H16, Standard_F4s_v2, Standard_F8s_v2, Standard_F16s_v2, Standard_F32s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_NC12, Standard_NC24, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_ND96asr_v4, Standard_L8s_v2, Standard_L16s_v2, Standard_L32s_v2, Standard_L64s_v2, Standard_L80s_v2, Standard_NV36ads_A10_v5, Standard_NV36adms_A10_v5, Standard_NV72ads_A10_v5, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_D4s_v5, Standard_D8s_v5, Standard_D16s_v5, Standard_D32s_v5, Standard_D48s_v5, Standard_D64s_v5, Standard_D96s_v5, Standard_D4ds_v5, Standard_D8ds_v5, Standard_D16ds_v5, Standard_D32ds_v5, Standard_D48ds_v5, Standard_D64ds_v5, Standard_D96ds_v5, Standard_E4s_v5, Standard_E8s_v5, Standard_E16s_v5, Standard_E20s_v5, Standard_E32s_v5, Standard_E48s_v5, Standard_E64s_v5, Standard_E96s_v5, Standard_E4ds_v5, Standard_E8ds_v5, Standard_E16ds_v5, Standard_E20ds_v5, Standard_E32ds_v5, Standard_E48ds_v5, Standard_E64ds_v5, Standard_E96ds_v5, Standard_L8s_v3, Standard_L16s_v3, Standard_L32s_v3, Standard_L48s_v3, Standard_L64s_v3, Standard_L80s_v3, Standard_L8as_v3, Standard_L16as_v3, Standard_L32as_v3, Standard_L48as_v3, Standard_L64as_v3, Standard_L80as_v3, Standard_DC4as_v5, Standard_DC8as_v5, Standard_DC16as_v5, Standard_DC32as_v5, Standard_EC8as_v5, Standard_EC16as_v5, Standard_EC32as_v5, Standard_EC8ads_v5, Standard_EC16ads_v5, Standard_EC32ads_v5 (9.221s)

databricks.sdk.errors.platform.InvalidParameterValue: Node type Standard_D4pds_v6 is not supported. Supported node types: Standard_DS3_v2, Standard_DS4_v2, Standard_DS5_v2, Standard_D4s_v3, Standard_D8s_v3, Standard_D16s_v3, Standard_D32s_v3, Standard_D64s_v3, Standard_D4a_v4, Standard_D8a_v4, Standard_D16a_v4, Standard_D32a_v4, Standard_D48a_v4, Standard_D64a_v4, Standard_D96a_v4, Standard_D8as_v4, Standard_D16as_v4, Standard_D32as_v4, Standard_D48as_v4, Standard_D64as_v4, Standard_D96as_v4, Standard_D4ds_v4, Standard_D8ds_v4, Standard_D16ds_v4, Standard_D32ds_v4, Standard_D48ds_v4, Standard_D64ds_v4, Standard_D3_v2, Standard_D4_v2, Standard_D5_v2, Standard_D8_v3, Standard_D16_v3, Standard_D32_v3, Standard_D64_v3, Standard_D4as_v5, Standard_D8as_v5, Standard_D16as_v5, Standard_D32as_v5, Standard_D48as_v5, Standard_D64as_v5, Standard_D96as_v5, Standard_D4ads_v5, Standard_D8ads_v5, Standard_D16ads_v5, Standard_D32ads_v5, Standard_D48ads_v5, Standard_D64ads_v5, Standard_D96ads_v5, Standard_D4d_v4, Standard_D8d_v4, Standard_D16d_v4, Standard_D32d_v4, Standard_D48d_v4, Standard_D64d_v4, Standard_D12_v2, Standard_D13_v2, Standard_D14_v2, Standard_D15_v2, Standard_DS12_v2, Standard_DS13_v2, Standard_DS14_v2, Standard_DS15_v2, Standard_E8_v3, Standard_E16_v3, Standard_E32_v3, Standard_E64_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E64s_v3, Standard_E4d_v4, Standard_E8d_v4, Standard_E16d_v4, Standard_E20d_v4, Standard_E32d_v4, Standard_E48d_v4, Standard_E64d_v4, Standard_E4ds_v4, Standard_E8ds_v4, Standard_E16ds_v4, Standard_E20ds_v4, Standard_E32ds_v4, Standard_E48ds_v4, Standard_E64ds_v4, Standard_E80ids_v4, Standard_E4a_v4, Standard_E8a_v4, Standard_E16a_v4, Standard_E20a_v4, Standard_E32a_v4, Standard_E48a_v4, Standard_E64a_v4, Standard_E96a_v4, Standard_E4as_v4, Standard_E8as_v4, Standard_E16as_v4, Standard_E20as_v4, Standard_E32as_v4, Standard_E48as_v4, Standard_E64as_v4, Standard_E96as_v4, Standard_E4s_v4, Standard_E8s_v4, Standard_E16s_v4, Standard_E20s_v4, Standard_E32s_v4, Standard_E48s_v4, Standard_E64s_v4, Standard_E80is_v4, Standard_E4as_v5, Standard_E8as_v5, Standard_E16as_v5, Standard_E20as_v5, Standard_E32as_v5, Standard_E48as_v5, Standard_E64as_v5, Standard_E96as_v5, Standard_E4ads_v5, Standard_E8ads_v5, Standard_E16ads_v5, Standard_E20ads_v5, Standard_E32ads_v5, Standard_E48ads_v5, Standard_E64ads_v5, Standard_E96ads_v5, Standard_L4s, Standard_L8s, Standard_L16s, Standard_L32s, Standard_F4, Standard_F8, Standard_F16, Standard_F4s, Standard_F8s, Standard_F16s, Standard_H8, Standard_H16, Standard_F4s_v2, Standard_F8s_v2, Standard_F16s_v2, Standard_F32s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_NC12, Standard_NC24, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_ND96asr_v4, Standard_L8s_v2, Standard_L16s_v2, Standard_L32s_v2, Standard_L64s_v2, Standard_L80s_v2, Standard_NV36ads_A10_v5, Standard_NV36adms_A10_v5, Standard_NV72ads_A10_v5, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_D4s_v5, Standard_D8s_v5, Standard_D16s_v5, Standard_D32s_v5, Standard_D48s_v5, Standard_D64s_v5, Standard_D96s_v5, Standard_D4ds_v5, Standard_D8ds_v5, Standard_D16ds_v5, Standard_D32ds_v5, Standard_D48ds_v5, Standard_D64ds_v5, Standard_D96ds_v5, Standard_E4s_v5, Standard_E8s_v5, Standard_E16s_v5, Standard_E20s_v5, Standard_E32s_v5, Standard_E48s_v5, Standard_E64s_v5, Standard_E96s_v5, Standard_E4ds_v5, Standard_E8ds_v5, Standard_E16ds_v5, Standard_E20ds_v5, Standard_E32ds_v5, Standard_E48ds_v5, Standard_E64ds_v5, Standard_E96ds_v5, Standard_L8s_v3, Standard_L16s_v3, Standard_L32s_v3, Standard_L48s_v3, Standard_L64s_v3, Standard_L80s_v3, Standard_L8as_v3, Standard_L16as_v3, Standard_L32as_v3, Standard_L48as_v3, Standard_L64as_v3, Standard_L80as_v3, Standard_DC4as_v5, Standard_DC8as_v5, Standard_DC16as_v5, Standard_DC32as_v5, Standard_EC8as_v5, Standard_EC16as_v5, Standard_EC32as_v5, Standard_EC8ads_v5, Standard_EC16ads_v5, Standard_EC32ads_v5
[gw1] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
[gw1] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

❌ test_instance_pools[True]: databricks.sdk.errors.platform.InvalidParameterValue: Node type Standard_D4pds_v6 is not supported. Supported node types: Standard_DS3_v2, Standard_DS4_v2, Standard_DS5_v2, Standard_D4s_v3, Standard_D8s_v3, Standard_D16s_v3, Standard_D32s_v3, Standard_D64s_v3, Standard_D4a_v4, Standard_D8a_v4, Standard_D16a_v4, Standard_D32a_v4, Standard_D48a_v4, Standard_D64a_v4, Standard_D96a_v4, Standard_D8as_v4, Standard_D16as_v4, Standard_D32as_v4, Standard_D48as_v4, Standard_D64as_v4, Standard_D96as_v4, Standard_D4ds_v4, Standard_D8ds_v4, Standard_D16ds_v4, Standard_D32ds_v4, Standard_D48ds_v4, Standard_D64ds_v4, Standard_D3_v2, Standard_D4_v2, Standard_D5_v2, Standard_D8_v3, Standard_D16_v3, Standard_D32_v3, Standard_D64_v3, Standard_D4as_v5, Standard_D8as_v5, Standard_D16as_v5, Standard_D32as_v5, Standard_D48as_v5, Standard_D64as_v5, Standard_D96as_v5, Standard_D4ads_v5, Standard_D8ads_v5, Standard_D16ads_v5, Standard_D32ads_v5, Standard_D48ads_v5, Standard_D64ads_v5, Standard_D96ads_v5, Standard_D4d_v4, Standard_D8d_v4, Standard_D16d_v4, Standard_D32d_v4, Standard_D48d_v4, Standard_D64d_v4, Standard_D12_v2, Standard_D13_v2, Standard_D14_v2, Standard_D15_v2, Standard_DS12_v2, Standard_DS13_v2, Standard_DS14_v2, Standard_DS15_v2, Standard_E8_v3, Standard_E16_v3, Standard_E32_v3, Standard_E64_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E64s_v3, Standard_E4d_v4, Standard_E8d_v4, Standard_E16d_v4, Standard_E20d_v4, Standard_E32d_v4, Standard_E48d_v4, Standard_E64d_v4, Standard_E4ds_v4, Standard_E8ds_v4, Standard_E16ds_v4, Standard_E20ds_v4, Standard_E32ds_v4, Standard_E48ds_v4, Standard_E64ds_v4, Standard_E80ids_v4, Standard_E4a_v4, Standard_E8a_v4, Standard_E16a_v4, Standard_E20a_v4, Standard_E32a_v4, Standard_E48a_v4, Standard_E64a_v4, Standard_E96a_v4, Standard_E4as_v4, Standard_E8as_v4, Standard_E16as_v4, Standard_E20as_v4, Standard_E32as_v4, Standard_E48as_v4, Standard_E64as_v4, Standard_E96as_v4, Standard_E4s_v4, Standard_E8s_v4, Standard_E16s_v4, Standard_E20s_v4, Standard_E32s_v4, Standard_E48s_v4, Standard_E64s_v4, Standard_E80is_v4, Standard_E4as_v5, Standard_E8as_v5, Standard_E16as_v5, Standard_E20as_v5, Standard_E32as_v5, Standard_E48as_v5, Standard_E64as_v5, Standard_E96as_v5, Standard_E4ads_v5, Standard_E8ads_v5, Standard_E16ads_v5, Standard_E20ads_v5, Standard_E32ads_v5, Standard_E48ads_v5, Standard_E64ads_v5, Standard_E96ads_v5, Standard_L4s, Standard_L8s, Standard_L16s, Standard_L32s, Standard_F4, Standard_F8, Standard_F16, Standard_F4s, Standard_F8s, Standard_F16s, Standard_H8, Standard_H16, Standard_F4s_v2, Standard_F8s_v2, Standard_F16s_v2, Standard_F32s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_NC12, Standard_NC24, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_ND96asr_v4, Standard_L8s_v2, Standard_L16s_v2, Standard_L32s_v2, Standard_L64s_v2, Standard_L80s_v2, Standard_NV36ads_A10_v5, Standard_NV36adms_A10_v5, Standard_NV72ads_A10_v5, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_D4s_v5, Standard_D8s_v5, Standard_D16s_v5, Standard_D32s_v5, Standard_D48s_v5, Standard_D64s_v5, Standard_D96s_v5, Standard_D4ds_v5, Standard_D8ds_v5, Standard_D16ds_v5, Standard_D32ds_v5, Standard_D48ds_v5, Standard_D64ds_v5, Standard_D96ds_v5, Standard_E4s_v5, Standard_E8s_v5, Standard_E16s_v5, Standard_E20s_v5, Standard_E32s_v5, Standard_E48s_v5, Standard_E64s_v5, Standard_E96s_v5, Standard_E4ds_v5, Standard_E8ds_v5, Standard_E16ds_v5, Standard_E20ds_v5, Standard_E32ds_v5, Standard_E48ds_v5, Standard_E64ds_v5, Standard_E96ds_v5, Standard_L8s_v3, Standard_L16s_v3, Standard_L32s_v3, Standard_L48s_v3, Standard_L64s_v3, Standard_L80s_v3, Standard_L8as_v3, Standard_L16as_v3, Standard_L32as_v3, Standard_L48as_v3, Standard_L64as_v3, Standard_L80as_v3, Standard_DC4as_v5, Standard_DC8as_v5, Standard_DC16as_v5, Standard_DC32as_v5, Standard_EC8as_v5, Standard_EC16as_v5, Standard_EC32as_v5, Standard_EC8ads_v5, Standard_EC16ads_v5, Standard_EC32ads_v5 (9.305s)

databricks.sdk.errors.platform.InvalidParameterValue: Node type Standard_D4pds_v6 is not supported. Supported node types: Standard_DS3_v2, Standard_DS4_v2, Standard_DS5_v2, Standard_D4s_v3, Standard_D8s_v3, Standard_D16s_v3, Standard_D32s_v3, Standard_D64s_v3, Standard_D4a_v4, Standard_D8a_v4, Standard_D16a_v4, Standard_D32a_v4, Standard_D48a_v4, Standard_D64a_v4, Standard_D96a_v4, Standard_D8as_v4, Standard_D16as_v4, Standard_D32as_v4, Standard_D48as_v4, Standard_D64as_v4, Standard_D96as_v4, Standard_D4ds_v4, Standard_D8ds_v4, Standard_D16ds_v4, Standard_D32ds_v4, Standard_D48ds_v4, Standard_D64ds_v4, Standard_D3_v2, Standard_D4_v2, Standard_D5_v2, Standard_D8_v3, Standard_D16_v3, Standard_D32_v3, Standard_D64_v3, Standard_D4as_v5, Standard_D8as_v5, Standard_D16as_v5, Standard_D32as_v5, Standard_D48as_v5, Standard_D64as_v5, Standard_D96as_v5, Standard_D4ads_v5, Standard_D8ads_v5, Standard_D16ads_v5, Standard_D32ads_v5, Standard_D48ads_v5, Standard_D64ads_v5, Standard_D96ads_v5, Standard_D4d_v4, Standard_D8d_v4, Standard_D16d_v4, Standard_D32d_v4, Standard_D48d_v4, Standard_D64d_v4, Standard_D12_v2, Standard_D13_v2, Standard_D14_v2, Standard_D15_v2, Standard_DS12_v2, Standard_DS13_v2, Standard_DS14_v2, Standard_DS15_v2, Standard_E8_v3, Standard_E16_v3, Standard_E32_v3, Standard_E64_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E64s_v3, Standard_E4d_v4, Standard_E8d_v4, Standard_E16d_v4, Standard_E20d_v4, Standard_E32d_v4, Standard_E48d_v4, Standard_E64d_v4, Standard_E4ds_v4, Standard_E8ds_v4, Standard_E16ds_v4, Standard_E20ds_v4, Standard_E32ds_v4, Standard_E48ds_v4, Standard_E64ds_v4, Standard_E80ids_v4, Standard_E4a_v4, Standard_E8a_v4, Standard_E16a_v4, Standard_E20a_v4, Standard_E32a_v4, Standard_E48a_v4, Standard_E64a_v4, Standard_E96a_v4, Standard_E4as_v4, Standard_E8as_v4, Standard_E16as_v4, Standard_E20as_v4, Standard_E32as_v4, Standard_E48as_v4, Standard_E64as_v4, Standard_E96as_v4, Standard_E4s_v4, Standard_E8s_v4, Standard_E16s_v4, Standard_E20s_v4, Standard_E32s_v4, Standard_E48s_v4, Standard_E64s_v4, Standard_E80is_v4, Standard_E4as_v5, Standard_E8as_v5, Standard_E16as_v5, Standard_E20as_v5, Standard_E32as_v5, Standard_E48as_v5, Standard_E64as_v5, Standard_E96as_v5, Standard_E4ads_v5, Standard_E8ads_v5, Standard_E16ads_v5, Standard_E20ads_v5, Standard_E32ads_v5, Standard_E48ads_v5, Standard_E64ads_v5, Standard_E96ads_v5, Standard_L4s, Standard_L8s, Standard_L16s, Standard_L32s, Standard_F4, Standard_F8, Standard_F16, Standard_F4s, Standard_F8s, Standard_F16s, Standard_H8, Standard_H16, Standard_F4s_v2, Standard_F8s_v2, Standard_F16s_v2, Standard_F32s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_NC12, Standard_NC24, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_ND96asr_v4, Standard_L8s_v2, Standard_L16s_v2, Standard_L32s_v2, Standard_L64s_v2, Standard_L80s_v2, Standard_NV36ads_A10_v5, Standard_NV36adms_A10_v5, Standard_NV72ads_A10_v5, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_D4s_v5, Standard_D8s_v5, Standard_D16s_v5, Standard_D32s_v5, Standard_D48s_v5, Standard_D64s_v5, Standard_D96s_v5, Standard_D4ds_v5, Standard_D8ds_v5, Standard_D16ds_v5, Standard_D32ds_v5, Standard_D48ds_v5, Standard_D64ds_v5, Standard_D96ds_v5, Standard_E4s_v5, Standard_E8s_v5, Standard_E16s_v5, Standard_E20s_v5, Standard_E32s_v5, Standard_E48s_v5, Standard_E64s_v5, Standard_E96s_v5, Standard_E4ds_v5, Standard_E8ds_v5, Standard_E16ds_v5, Standard_E20ds_v5, Standard_E32ds_v5, Standard_E48ds_v5, Standard_E64ds_v5, Standard_E96ds_v5, Standard_L8s_v3, Standard_L16s_v3, Standard_L32s_v3, Standard_L48s_v3, Standard_L64s_v3, Standard_L80s_v3, Standard_L8as_v3, Standard_L16as_v3, Standard_L32as_v3, Standard_L48as_v3, Standard_L64as_v3, Standard_L80as_v3, Standard_DC4as_v5, Standard_DC8as_v5, Standard_DC16as_v5, Standard_DC32as_v5, Standard_EC8as_v5, Standard_EC16as_v5, Standard_EC32as_v5, Standard_EC8ads_v5, Standard_EC16ads_v5, Standard_EC32ads_v5
[gw4] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
[gw4] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

Flaky tests:

🤪 test_running_real_remove_backup_groups_job (5m39.202s)

_{Running from acceptance #6861}

nfx

please simplify the toposort algorithm with the provided outline and use Ownership subclases to get the right owner name.

src/databricks/labs/ucx/sequencing/sequencing.py

nfx · 2024-10-16T15:15:15Z

src/databricks/labs/ucx/sequencing/sequencing.py

+            node_id=0, object_type="ROOT", object_id="ROOT", object_name="ROOT", object_owner="NONE"
+        )
+
+    def register_workflow_task(self, task: jobs.Task, job: jobs.Job, _graph: DependencyGraph) -> MigrationNode:


Suggested change

def register_workflow_task(self, task: jobs.Task, job: jobs.Job, _graph: DependencyGraph) -> MigrationNode:

def _register_workflow_task(self, task: jobs.Task, job: jobs.Job, graph: DependencyGraph) -> MigrationNode:

make this one private. we should start from the job, not the task

the current thinking is to leverage the existing dependency graph in WorkflowLinter.refresh_report in order to avoid rebuilding it and re-fetch all assets from the workspace. Starting from the job would make that impossible.

nfx · 2024-10-16T15:15:32Z

src/databricks/labs/ucx/sequencing/sequencing.py

+    def __init__(self, ws: WorkspaceClient):
+        self._ws = ws
+        self._root = MigrationNode(
+            node_id=0, object_type="ROOT", object_id="ROOT", object_name="ROOT", object_owner="NONE"


Suggested change

node_id=0, object_type="ROOT", object_id="ROOT", object_name="ROOT", object_owner="NONE"

node_id=0, object_type="ROOT", object_id="ROOT", object_name="ROOT", object_owner="NONE",

make fmt

make fmt doesn't change this code...

with trailing comma it should

that line is gone anyway

nfx · 2024-10-16T15:34:10Z

src/databricks/labs/ucx/sequencing/sequencing.py

+            object_name=task.task_key,
+            object_owner=job_node.object_owner,  # no task owner so use job one
+        )
+        job_node.required_steps.append(task_node)


this algorithm is a bit convoluted and hard to follow. please rewrite with the adjacency map and do a topological sorting with kahn's algorithm in this class. e.g.

self._adjacency = collections.defaultdict(set) ... # have a central _nodes field to have all the nodes addressable by (TYPE, ID) tuple self._nodes[('TASK', task_id)] = MigrationNode( node_id=self._last_node_id, object_type="TASK", object_id=task_id, object_name=task.task_key, object_owner=job_node.object_owner, ) self._adjacency[('JOB', job_id)].append(('TASK', task_id))

... and actual toposort is pretty straightforward with Kahn's algo:

indegrees = collections.defaultdict(int) for src, dep_set in self._adjacency.items(): indegrees[src] = len(dep_set) # count incoming dependencies for a node queue, toposorted, sequence_num = [], [], 0 for src, incoming in indegrees.items(): if incoming > 0: continue queue.append(src) # start with zero-dependencies nodes while queue: curr = queue.popleft() toposorted.append(replace(self._nodes[curr], sequence_num=sequence_num)) sequence_num += 1 for dep in self._adjacency[curr]: indegrees[dep] -= 1 if indegrees[dep] == 0: queue.append(dep) return toposorted

i find it confusing with all those _deduplicate_steps and _find_node methods.

The problem with Kahn is it only works for DAGs, which we can't guarantee. We have duplicates and recursive loops. I'm trying it out but I suspect it will break on solacc.

where do we have recursive loops? incomplete DAGs would also be fine, as node without dependencies would alway be first

Consider the following code:

file A.py:

class A: def return_a_bee(self): from B import B return B()

file B.py:

from A import A class B(A): pass

file T.py:

from A import A from B import B

the above will return the following dependency graph fragment:

T / \ A B / \ B A

When building the dependency graph we detect the recursive cycle (and we break it).
In the above, there is no 0-dependency node, so not sure Kahn will work...

I've implemented Kahn. I'll check whether it fails in PR for phase 2 (which deals with the dependency graph)

nfx · 2024-10-16T15:37:06Z

src/databricks/labs/ucx/sequencing/sequencing.py

+            object_type="JOB",
+            object_id=str(job.job_id),
+            object_name=job_name,
+            object_owner=job.creator_user_name or "<UNKNOWN>",


inject use databricks.labs.ucx.framework.owners.Ownership subclasses to properly determine object owners.

done. Requires #2999

#2999 no longer needed

nfx · 2024-10-17T11:58:42Z

src/databricks/labs/ucx/sequencing/sequencing.py

+            object_name=task.task_key,
+            object_owner=job_node.object_owner,  # no task owner so use job one
+        )
+        job_node.required_steps.append(task_node)


where do we have recursive loops? incomplete DAGs would also be fine, as node without dependencies would alway be first

nfx · 2024-10-17T12:00:30Z

src/databricks/labs/ucx/sequencing/sequencing.py

+            object_type="JOB",
+            object_id=str(job.job_id),
+            object_name=job_name,
+            object_owner=JobOwnership(self._admin_locator).owner_of(job),


inject instance of JobOwnership via constructor, it has to be on GlobalContext

It's not on GlobalContext, and I guess making that happen deserves a dedicated PR ?
Can we make this change later ?

nfx · 2024-10-17T12:02:49Z

src/databricks/labs/ucx/assessment/clusters.py

+    This is the cluster creator (if known).
+    """
+
+    def _maybe_direct_owner(self, record: ClusterDetails) -> str | None:


convert from ClusterDetails to ClusterInfo and don't do any renames/new classes. we have "cluster" as a concept within our system, ClusterDetails and ClusterInfo are just the names of API generated classes which are slightly different views on the cluster entity.

TLDR: don't rename existing ownership classes.

# Conflicts: # src/databricks/labs/ucx/source_code/jobs.py # tests/integration/hive_metastore/test_catalog_schema.py # tests/unit/hive_metastore/test_table_migrate.py

nfx

re-rebase

nfx · 2024-10-17T16:13:18Z

CHANGELOG.md

@@ -1,25 +1,5 @@
 # Version changelog

-## 0.45.0


bad rebase, most likely

ericvergnaud · 2024-10-17T16:40:39Z

Superseded by #3008

## Changes Add a `MigrationSequencer` class to sequence the migration steps for jobs. The PR includes the following resources in its sequence: - Jobs - Job tasks - Job tasks dependencies - Job clusters - Cluster Other elements part of the sequence are added later ### Linked issues Progresses #1415 Supersedes #2980 ### Tests - [x] added unit tests - [x] added integration tests --------- Co-authored-by: Eric Vergnaud <[email protected]> Co-authored-by: Cor Zuurmond <[email protected]>

* Added `MigrationSequencer` for jobs ([#3008](#3008)). In this commit, a `MigrationSequencer` class has been added to manage the migration sequence for various resources including jobs, job tasks, job task dependencies, job clusters, and clusters. The class builds a graph of dependencies and analyzes it to generate the migration sequence, which is returned as an iterable of `MigrationStep` objects. These objects contain information about the object type, ID, name, owner, required step IDs, and step number. The commit also includes new unit and integration tests to ensure the functionality is working correctly. The migration sequence is used in tests for assessing the sequencing feature, and it handles tasks that reference existing or non-existing clusters or job clusters, and new cluster definitions. This change is linked to issue [#1415](#1415) and supersedes issue [#2980](#2980). Additionally, the commit removes some unnecessary imports and fixtures from a test file. * Added `phik` to known list ([#3198](#3198)). In this release, we have added `phik` to the known list in the provided JSON file. This change addresses part of issue [#1931](#1931), as outlined in the linked issues. The `phik` key has been added with an empty list as its value, consistent with the structure of other keys in the JSON file. It is important to note that no existing functionality has been altered and no new methods have been introduced in this commit. The scope of the change is confined to updating the known list in the JSON file by adding the `phik` key. * Added `pmdarima` to known list ([#3199](#3199)). In this release, we are excited to announce the addition of support for the `pmdarima` library, an open-source Python library for automatic seasonal decomposition of time series. With this commit, we have added `pmdarima` to our known list of libraries, providing our users with access to its various methods and functions for data preprocessing, model selection, and visualization. The library is particularly useful for fitting ARIMA models and testing for seasonality. By integrating `pmdarima`, users can now perform time series analysis and forecasting with greater ease and efficiency. This change partly resolves issue [#1931](#1931) and underscores our commitment to providing our users with access to the latest and most innovative open-source libraries available. * Added `preshed` to known list ([#3220](#3220)). A new library, "preshed," has been added to our project's supported libraries, enhancing compatibility and enabling efficient utilization of its capabilities. Developed using Cython, `preshed` is a Python interface to Intel(R) MKL's sparse BLAS, sparse solvers, and sparse linear algebra routines. With the inclusion of two modules, `preshed` and "preshed.about," this addition partially resolves issue [#1931](#1931), improving the project's overall performance and reliability in sparse linear algebra tasks. Software engineers can now leverage the `preshed` library's features and optimized routines for their projects, reducing development time and increasing efficiency. * Added `py-cpuinfo` to known list ([#3221](#3221)). In this release, we have added support for the `py-cpuinfo` library to our project, enabling the use of the `cpuinfo` functionality that it provides. With this addition, developers can now access detailed information about the CPU, such as the number of cores, current frequency, and vendor, which can be useful for performance tuning and optimization. This change partially resolves issue [#1931](#1931) and does not affect any existing functionality or add new methods to the codebase. We believe that this improvement will enhance the capabilities of our project and enable more efficient use of CPU resources. * Cater for empty python cells ([#3212](#3212)). In this release, we have resolved an issue where certain notebook cells in the dependency builder were causing crashes. Specifically, empty or comment-only cells were identified as the source of the problem. To address this, we have implemented a check to account for these cases, ensuring that an empty tree is stored in the `_python_trees` dictionary if the input cell does not produce a valid tree. This change helps prevent crashes in the dependency builder caused by empty or comment-only cells. Furthermore, we have added a test to verify the fix on a failed repository. If a cell does not produce a tree, the `_load_children_from_tree` method will not be executed for that cell, skipping the loading of any children trees. This enhancement improves the overall stability and reliability of the library by preventing crashes caused by invalid input. * Create `TODO` issues every nightly run ([#3196](#3196)). A commit has been made to update the `acceptance` repository version in the `acceptance.yml` GitHub workflow from `acceptance/v0.4.0` to `acceptance/v0.4.2`, which affects the integration tests. The `Run nightly tests` step in the GitHub repository's workflow has also been updated to use a newer version of the `databrickslabs/sandbox/acceptance` action, from `v0.3.1` to `v0.4.2`. Software engineers should verify that the new version of the `acceptance` repository contains all necessary updates and fixes, and that the integration tests continue to function as expected. Additionally, testing the updated action is important to ensure that the nightly tests run successfully with up-to-date code and can catch potential issues. * Fixed Integration test failure of migration_tables ([#3108](#3108)). This release includes a fix for two integration tests (`test_migrate_managed_table_to_external_table_without_conversion` and `test_migrate_managed_table_to_external_table_with_clone`) related to Hive Metastore table migration, addressing issues [#3054](#3054) and [#3055](#3055). Previously skipped due to underlying problems, these tests have now been unskipped, enhancing the migration feature's test coverage. No changes have been made to the existing functionality, as the focus is solely on including the previously skipped tests in the testing suite. The changes involve removing `@pytest.mark.skip` markers from the test functions, ensuring they run and provide a more comprehensive test coverage for the Hive Metastore migration feature. In addition, this release includes an update to DirectFsAccess integration tests, addressing issues related to the removal of DFSA collectors and ensuring proper handling of different file types, with no modifications made to other parts of the codebase. * Replace MockInstallation with MockPathLookup for testing fixtures ([#3215](#3215)). In this release, we have updated the testing fixtures in our unit tests by replacing the MockInstallation class with MockPathLookup. Specifically, we have modified the _load_sources function to use MockPathLookup instead of MockInstallation for loading sources. This change not only enhances the testing capabilities of the module but also introduces a new logger, logger, for more precise logging within the module. Additionally, we have updated the _load_sources function calls in the test_notebook.py file to pass the file path directly instead of a SourceContainer object. This modification allows for more flexible and straightforward testing of file-related functionality, thereby fixing issue [#3115](#3115). * Updated sqlglot requirement from <25.29,>=25.5.0 to >=25.5.0,<25.30 ([#3224](#3224)). The open-source library `sqlglot` has been updated to version 25.29.0 with this release, incorporating several breaking changes, new features, and bug fixes. The breaking changes include transpiling `ANY` to `EXISTS`, supporting the `MEDIAN()` function, wrapping values in `NOT value IS ...`, and parsing information schema views into a single identifier. New features include support for the `JSONB_EXISTS` function in PostgreSQL, transpiling `ANY` to `EXISTS` in Spark, transpiling Snowflake's `TIMESTAMP()` function, and adding support for hexadecimal literals in Teradata. Bug fixes include handling a Move edge case in the semantic differ, adding a `NULL` filter on `ARRAY_AGG` only for columns, improving parsing of `WITH FILL ... INTERPOLATE` in Clickhouse, generating `LOG(...)` for `exp.Ln` in TSQL, and optionally parsing a Stream expression. The full changelog can be found in the pull request, which also includes a list of the commits included in this release. * Use acceptance/v0.4.0 ([#3192](#3192)). A change has been made to the GitHub Actions workflow file for acceptance tests, updating the version of the `databrickslabs/sandbox/acceptance` runner to `acceptance/v0.4.0` and granting write permissions for the `issues` field in the `permissions` section. These updates will allow for the use of the latest version of the acceptance tests and provide the necessary permissions to interact with issues. A `TODO` comment has been added to indicate that the new version of the acceptance tests needs to be updated elsewhere in the codebase. This change will ensure that the acceptance tests are up-to-date and functioning properly. * Warn about errors instead to avoid job task failure ([#3219](#3219)). In this change, the `refresh_report` method in `jobs.py` has been updated to log warnings instead of raising errors when certain problems are encountered during its execution. Previously, if there were any errors during the linting process, a `ManyError` exception was raised, causing the job task to fail. Now, errors are logged as warnings, allowing the job task to continue running successfully. This resolves issue [#3214](#3214) and ensures that the job task will not fail due to linting errors, allowing users to be aware of any issues that occurred during the linting process while still completing the job task successfully. The updated method checks for errors during the linting process, adds them to a list, and constructs a string of error messages if there are any. This string of error messages is then logged as a warning using the `logger.warning` function, allowing the method to continue executing and the job task to complete successfully. * [DOC] Add dashboard section ([#3222](#3222)). In this release, we have added a new dashboard section to the project documentation, which provides visualizations of UCX's outcomes to help users better understand and manage their UCX environment. The new section includes a table listing the available dashboards, including the Azure service principals dashboard. This dashboard displays information about Azure service principals discovered by UCX in configurations from various sources such as clusters, cluster policies, job clusters, pipelines, and warehouses. Each dashboard has text widgets that offer detailed information about the contents and are designed to help users understand UCX's results and progress in a more visual and interactive way. The Azure service principals dashboard specifically offers users valuable insights into their Azure service principals within the UCX environment. * [DOC] README.md rewrite ([#3211](#3211)). The Databricks Labs UCX package offers a suite of tools for migrating data objects from the Hive metastore to Unity Catalog (UC), encompassing a comprehensive table migration process. This process consists of table mapping, data access setup, creating new UC resources, and migrating Hive metastore data objects. Table mapping is achieved using a table mapping file that defaults to mapping all tables/views to UC tables while preserving the original schema and names, but can be customized as needed. Data access setup involves creating and modifying cloud principals and credentials for UC data. New UC resources are created without affecting existing Hive metastore resources, and users can choose from various strategies for migrating tables based on their format and location. Additionally, the package provides installation resources, including a README notebook, a DEBUG notebook, debug logs, and installation configuration, as well as utility commands for viewing and repairing workflows. The migration process also includes an assessment workflow, group migration workflow, data reconciliation, and code migration commands. * [chore] Added tests to verify linter not being stuck in the infinite loop ([#3225](#3225)). In this release, we have added new functional tests to ensure that the linter does not get stuck in an infinite loop, addressing a bug that was fixed in version 0.46.0 related to the default format change from Parquet to Delta in Databricks Runtime 8.0 and a SQL parse error. These tests involve creating data frames, writing them to tables, and reading from those tables, using PySpark's SQL functions and a system information schema table to demonstrate the corrected behavior. The tests also include SQL queries that select columns from a system information schema table with a specified limit, using a withColumn() method to add a new column to a data frame based on a condition. These new tests provide assurance that the linter will not get stuck in an infinite loop and that SQL queries with table parameters are supported. * [internal] Temporarily disable integration tests due to ES-1302145 ([#3226](#3226)). In this release, the integration tests for moving tables, views, and aliasing tables have been temporarily disabled due to issue ES-1302145. The `test_move_tables`, `test_move_views`, and `test_alias_tables` functions were previously decorated with `@retried` to handle potential `NotFound` exceptions and had a timeout of 2 minutes, but are now marked with `@pytest.mark.skip("ES-1302145")`. Once the issue is resolved, the `@pytest.mark.skip` decorator should be removed to re-enable the tests. The remaining code in the file, including the `test_move_tables_no_from_schema`, `test_move_tables_no_to_schema`, and `test_move_views_no_from_schema` functions, is unchanged and still functional. * use a path instance for MISSING_SOURCE_PATH and add test ([#3217](#3217)). In this release, the handling of MISSING_SOURCE_PATH has been improved by replacing the string representation with a Path instance using Pathlib, which simplifies checks for missing source paths and enables the addition of a new test for the DependencyProblem class. This test verifies the behavior of the newly introduced method, is_path_missing(), in the DependencyProblem class for determining if a given problem is caused by a missing path. Co-authored by Eric Vergnaud, these changes not only improve the handling and testing of missing paths but also contribute to enhancing the source code analysis functionality of the databricks/labs/ucx project. Dependency updates: * Updated sqlglot requirement from <25.29,>=25.5.0 to >=25.5.0,<25.30 ([#3224](#3224)).

Eric Vergnaud added 7 commits October 16, 2024 10:51

make simple_dependency_resolver available more broadly

8eba903

build migration steps for workflow task

5f65831

fix pylint warnings

52c5495

fix pylint warnings

1860917

add object name

ae23d20

populate object owner

9c63b8b

be more defensive

a1734b5

ericvergnaud requested a review from a team as a code owner October 16, 2024 10:56

ericvergnaud had a problem deploying to account-admin October 16, 2024 10:56 — with GitHub Actions Failure

ericvergnaud had a problem deploying to account-admin October 16, 2024 11:56 — with GitHub Actions Failure

nfx requested changes Oct 16, 2024

View reviewed changes

move last_node_id to sequencer

872d74c

ericvergnaud had a problem deploying to account-admin October 17, 2024 08:53 — with GitHub Actions Error

Eric Vergnaud added 6 commits October 17, 2024 11:27

rename JobOwnership to JobIinfoOwnership and add JobOwnership

18acdc0

rename ClusterOwnership to ClusterInfoOwnership

b7b0bb2

add ClusterDetailsOwnership

d0a6f6d

formatting

3a411c6

Merge branch 'main' into more-ownership-classes

5b15124

Merge branch 'more-ownership-classes' into migration-sequencing-step-1

73947c2

ericvergnaud temporarily deployed to account-admin October 17, 2024 09:53 — with GitHub Actions Inactive

Use 'Ownership' classes

082602b

ericvergnaud temporarily deployed to account-admin October 17, 2024 10:16 — with GitHub Actions Inactive

nfx requested changes Oct 17, 2024

View reviewed changes

sort using adapted Kahn algo

bb56fba

ericvergnaud temporarily deployed to account-admin October 17, 2024 13:15 — with GitHub Actions Inactive

Eric Vergnaud added 3 commits October 17, 2024 15:57

revert merge

bfce474

Merge branch 'main' into migration-sequencing-step-1

07bcab5

# Conflicts: # src/databricks/labs/ucx/source_code/jobs.py # tests/integration/hive_metastore/test_catalog_schema.py # tests/unit/hive_metastore/test_table_migrate.py

fix merge issues

d0a957b

ericvergnaud had a problem deploying to account-admin October 17, 2024 14:23 — with GitHub Actions Error

use existing ownership classes

4ee30a0

ericvergnaud had a problem deploying to account-admin October 17, 2024 14:26 — with GitHub Actions Failure

nfx requested changes Oct 17, 2024

View reviewed changes

CHANGELOG.md

@@ -1,25 +1,5 @@

# Version changelog

## 0.45.0

Copy link

Collaborator

nfx Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad rebase, most likely

Merge branch 'main' into migration-sequencing-step-1

5fbd3bb

ericvergnaud had a problem deploying to account-admin October 17, 2024 16:15 — with GitHub Actions Failure

ericvergnaud mentioned this pull request Oct 17, 2024

Add MigrationSequencer for jobs #3008

Merged

2 tasks

ericvergnaud closed this Oct 17, 2024

nfx mentioned this pull request Oct 21, 2024

Crawlers: append snapshots to history journal, if available #2743

Merged

3 tasks

nfx mentioned this pull request Nov 8, 2024

Release v0.49.0 #3228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement migration sequencing (phase 1) #2980

Implement migration sequencing (phase 1) #2980

ericvergnaud commented Oct 16, 2024

github-actions bot commented Oct 16, 2024 •

edited

Loading

nfx left a comment

nfx Oct 16, 2024

ericvergnaud Oct 17, 2024

nfx Oct 17, 2024

nfx Oct 16, 2024

ericvergnaud Oct 17, 2024

nfx Oct 17, 2024

ericvergnaud Oct 17, 2024

nfx Oct 16, 2024

ericvergnaud Oct 17, 2024

nfx Oct 17, 2024

ericvergnaud Oct 17, 2024 •

edited

Loading

ericvergnaud Oct 17, 2024

nfx Oct 16, 2024

ericvergnaud Oct 17, 2024

ericvergnaud Oct 17, 2024

ericvergnaud Oct 17, 2024

nfx Oct 17, 2024

nfx Oct 17, 2024

ericvergnaud Oct 17, 2024

nfx Oct 17, 2024

ericvergnaud Oct 17, 2024

nfx left a comment

nfx Oct 17, 2024

ericvergnaud commented Oct 17, 2024

	def register_workflow_task(self, task: jobs.Task, job: jobs.Job, _graph: DependencyGraph) -> MigrationNode:
	def _register_workflow_task(self, task: jobs.Task, job: jobs.Job, graph: DependencyGraph) -> MigrationNode:

	node_id=0, object_type="ROOT", object_id="ROOT", object_name="ROOT", object_owner="NONE"
	node_id=0, object_type="ROOT", object_id="ROOT", object_name="ROOT", object_owner="NONE",

Implement migration sequencing (phase 1) #2980

Implement migration sequencing (phase 1) #2980

Conversation

ericvergnaud commented Oct 16, 2024

Changes

Linked issues

Functionality

Tests

github-actions bot commented Oct 16, 2024 • edited Loading

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericvergnaud Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericvergnaud commented Oct 17, 2024

github-actions bot commented Oct 16, 2024 •

edited

Loading

ericvergnaud Oct 17, 2024 •

edited

Loading